Reinforcement Learning: Insights from Interesting Failures in Parameter Selection

نویسندگان

  • Wolfgang Konen
  • Thomas Bartz-Beielstein
چکیده

We investigate reinforcement learning methods, namely the temporal difference learning TD(λ) algorithm, on game-learning tasks. Small modifications in algorithm setup and parameter choice can have significant impact on success or failure to learn. We demonstrate that small differences in input features influence significantly the learning process. By selecting the right feature set we found good results within only 1/100 of the learning steps reported in the literature. Different metrics for measuring success in a reproducible manner are developed. We discuss why linear output functions are often preferable compared to sigmoid output functions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stock Price Prediction using Machine Learning and Swarm Intelligence

Background and Objectives: Stock price prediction has become one of the interesting and also challenging topics for researchers in the past few years. Due to the non-linear nature of the time-series data of the stock prices, mathematical modeling approaches usually fail to yield acceptable results. Therefore, machine learning methods can be a promising solution to this problem. Methods: In this...

متن کامل

Reinforcement Learning in Multi-agent Games

This article investigates the performance of independent reinforcement learners in multiagent games. Convergence to Nash equilibria and parameter settings for desired learning behavior are discussed for Q-learning, Frequency Maximum Q value (FMQ) learning and lenient Q-learning. FMQ and lenient Q-learning are shown to outperform regular Q-learning significantly in the context of coordination ga...

متن کامل

Machine Learning for Bandwidth Management in Decentralized Networks

The successful operation of a peer-to-peer network depends on the resilience of its peer’s communications. On the Internet, direct connections between peers are often limited by restrictions like NATs and traffic filtering. Addressing such problems is particularly pressing for peer-to-peer networks that do not wish to rely on any trusted infrastructure, which might otherwise help the participan...

متن کامل

Dynamic Adjustment of the Motivation Degree in an Action Selection Mechanism

This paper presents a model for dynamic adjustment of the motivation degree, using a reinforcement learning approach, in an action selection mechanism previously developed by the authors. The learning takes place in the modification of a parameter of the model of combination of internal and external stimuli. Experiments that show the claimed properties are presented, using a VR simulation devel...

متن کامل

Density-Adaptive Learning and Forgetting

We describe a density-adaptive reinforcement learning and a density-adaptive forgetting algorithm. This learning algorithm uses hybrid k-D/2k-trees to allow for a variable resolution partitioning and labelling of the input space. The density adaptive forgetting algorithm deletes observations from the learning set depending on whether subsequent evidence is available in a local region of the par...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008